74 research outputs found

    Penalized wavelets: Embedding wavelets into semiparametric regression

    Full text link
    We introduce the concept of penalized wavelets to facilitate seamless embedding of wavelets into semiparametric regression models. In particular, we show that penalized wavelets are analogous to penalized splines; the latter being the established approach to function estimation in semiparametric regression. They differ only in the type of penalization that is appropriate. This fact is not borne out by the existing wavelet literature, where the regression modelling and fitting issues are overshadowed by computational issues such as efficiency gains afforded by the Discrete Wavelet Transform and partially obscured by a tendency to work in the wavelet coefficient space. With penalized wavelet structure in place, we then show that fitting and inference can be achieved via the same general approaches used for penalized splines: penalized least squares, maximum likelihood and best prediction within a frequentist mixed model framework, and Markov chain Monte Carlo and mean field variational Bayes within a Bayesian framework. Penalized wavelets are also shown have a close relationship with wide data ("p ≫ ≫ n") regression and benefit from ongoing research on that topic

    Theory of Gaussian variational approximation for a Poisson mixed model

    Get PDF
    Likelihood-based inference for the parameters of generalized linear mixed models is hindered by the presence of intractable integrals. Gaussian variational approximation provides a fast and effective means of approximate inference. We provide some theory for this type of approximation for a simple Poisson mixed model. In particular, we establish consistency at rate m -1/2 +n-1, where m is the number of groups and n is the number of repeated measurements

    Mean field variational bayes for continuous sparse signal shrinkage: Pitfalls and remedies

    Get PDF
    © 2014, Institute of Mathematical Statistics. All rights received. We investigate mean field variational approximate Bayesian inference for models that use continuous distributions, Horseshoe, Negative-Exponential-Gamma and Generalized Double Pareto, for sparse signal shrinkage. Our principal finding is that the most natural, and simplest, mean field variational Bayes algorithm can perform quite poorly due to posterior dependence among auxiliary variables. More sophisticated algorithms, based on special functions, are shown to be superior. Continued fraction approximations via Lentz's Algorithm are developed to make the algorithms practical

    Mean field variational bayes for elaborate distributions

    Full text link
    We develop strategies for mean field variational Bayes approximate inference for Bayesian hierarchical models containing elaborate distributions. We loosely define elaborate distributions to be those having more complicated forms compared with common distributions such as those in the Normal and Gamma families. Examples are Asymmetric Laplace, Skew Normal and Generalized Extreme Value distributions. Such models suffer from the difficulty that the parameter updates do not admit closed form solutions. We circumvent this problem through a combination of (a) specially tailored auxiliary variables, (b) univariate quadrature schemes and (c) finite mixture approximations of troublesome density functions. An accuracy assessment is conducted and the new methodology is illustrated in an application. © 2011 International Society for Bayesian Analysis

    Semiparametric regression analysis via Infer.NET

    Get PDF
    © 2018, American Statistical Association. All rights reserved. We provide several examples of Bayesian semiparametric regression analysis via the Infer.NET package for approximate deterministic inference in Bayesian models. The examples are chosen to encompass a wide range of semiparametric regression situations. Infer.NET is shown to produce accurate inference in comparison with Markov chain Monte Carlo via the BUGS package, but to be considerably faster. Potentially, this contribution represents the start of a new era for semiparametric regression, where large and complex analyses are performed via fast Bayesian inference methodology and software, mainly being developed within Machine Learning

    AdaSampling for positive-unlabeled and label noise learning with bioinformatics applications

    Full text link
    © 2018 IEEE. Class labels are required for supervised learning but may be corrupted or missing in various applications. In binary classification, for example, when only a subset of positive instances is labeled whereas the remaining are unlabeled, positive-unlabeled (PU) learning is required to model from both positive and unlabeled data. Similarly, when class labels are corrupted by mislabeled instances, methods are needed for learning in the presence of class label noise (LN). Here we propose adaptive sampling (AdaSampling), a framework for both PU learning and learning with class LN. By iteratively estimating the class mislabeling probability with an adaptive sampling procedure, the proposed method progressively reduces the risk of selecting mislabeled instances for model training and subsequently constructs highly generalizable models even when a large proportion of mislabeled instances is present in the data. We demonstrate the utilities of proposed methods using simulation and benchmark data, and compare them to alternative approaches that are commonly used for PU learning and/or learning with LN. We then introduce two novel bioinformatics applications where AdaSampling is used to: 1) identify kinase-substrates from mass spectrometry-based phosphoproteomics data and 2) predict transcription factor target genes by integrating various next-generation sequencing data

    Platinum-(IV)-derivative satraplatin induced G2/M cell cycle perturbation via p53-p21(waf1/cip1)-independent pathway in human colorectal cancer cells

    Get PDF
    Platinum-(IV)-derivative satraplatin represents a new generation of orally available anti-cancer drugs that are under development for the treatment of several cancers. Understanding the mechanisms of cell cycle modulation and apoptosis is necessary to define the mode of action of satraplatin. In this study, we investigate the ability of satraplatin to induce cell cycle perturbation, clonogenicity loss and apoptosis in colorectal cancer (CRC) cells.Platinum-(IV)-derivative satraplatin represents a new generation of orally available anti-cancer drugs that are under development for the treatment of several cancers. Understanding the mechanisms of cell cycle modulation and apoptosis is necessary to define the mode of action of satraplatin. In this study, we investigate the ability of satraplatin to induce cell cycle perturbation, clonogenicity loss and apoptosis in colorectal cancer (CRC) cells

    Seasonality in pulmonary tuberculosis among migrant workers entering Kuwait

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>There is paucity of data on seasonal variation in pulmonary tuberculosis (TB) in developing countries contrary to recognized seasonality in the TB notification in western societies. This study examined the seasonal pattern in TB diagnosis among migrant workers from developing countries entering Kuwait.</p> <p>Methods</p> <p>Monthly aggregates of TB diagnosis results for consecutive migrants tested between January I, 1997 and December 31, 2006 were analyzed. We assessed the amplitude (<it>α</it>) of the sinusoidal oscillation and the time at which maximum (<it>θ</it>°) TB cases were detected using Edwards' test. The adequacy of the hypothesized sinusoidal curve was assessed by <it>χ</it><sup>2 </sup>goodness-of-fit test.</p> <p>Results</p> <p>During the 10 year study period, the proportion (per 100,000) of pulmonary TB cases among the migrants was 198 (4608/2328582), (95% confidence interval: 192 – 204). The adjusted mean monthly number of pulmonary TB cases was 384. Based on the observed seasonal pattern in the data, the maximum number of TB cases was expected during the last week of April (<it>θ</it>° = 112°; <it>P </it>< 0.001). The amplitude (± se) (<it>α </it>= 0.204 ± 0.04) of simple harmonic curve showed 20.4% difference from the mean to maximum TB cases. The peak to low ratio of adjusted number of TB cases was 1.51 (95% CI: 1.39 – 1.65). The <it>χ</it><sup>2 </sup>goodness-of-test revealed that there was no significant (<it>P </it>> 0.1) departure of observed frequencies from the fitted simple harmonic curve. Seasonal component explained 55% of the total variation in the proportions of TB cases (100,000) among the migrants.</p> <p>Conclusion</p> <p>This regularity of peak seasonality in TB case detection may prove useful to institute measures that warrant a better attendance of migrants. Public health authorities may consider re-allocation of resources in the period of peak seasonality to minimize the risk of <it>Mycobacterium tuberculosis </it>infection to close contacts in this and comparable settings in the region having similar influx of immigrants from high TB burden countries. Epidemiological surveillance for the TB risk in the migrants in subsequent years and required chemotherapy of detected cases may contribute in global efforts to control this public health menace.</p

    An automated Raman-based platform for the sorting of live cells by functional properties

    Get PDF
    Stable-isotope probing is widely used to study the function of microbial taxa in their natural environment, but sorting of isotopically labelled microbial cells from complex samples for subsequent genomic analysis or cultivation is still in its early infancy. Here, we introduce an optofluidic platform for automated sorting of stable-isotope-probing-labelled microbial cells, combining microfluidics, optical tweezing and Raman microspectroscopy, which yields live cells suitable for subsequent single-cell genomics, mini-metagenomics or cultivation. We describe the design and optimization of this Raman-activated cell-sorting approach, illustrate its operation with four model bacteria (two intestinal, one soil and one marine) and demonstrate its high sorting accuracy (98.3 ± 1.7%), throughput (200-500 cells h-1; 3.3-8.3 cells min-1) and compatibility with cultivation. Application of this sorting approach for the metagenomic characterization of bacteria involved in mucin degradation in the mouse colon revealed a diverse consortium of bacteria, including several members of the underexplored family Muribaculaceae, highlighting both the complexity of this niche and the potential of Raman-activated cell sorting for identifying key players in targeted processes.</p

    Phytoremediation using Aquatic Plants

    Get PDF
    • …
    corecore